216 PART 5 Looking for Relationships with Correlation and Regression

For curves, finding the best-fitting curve is a very complicated mathematical

problem. What’s nice about the straight-line regression is that it’s so simple that

you can calculate the least-squares parameters from explicit formulas. If you’re

interested (or if your professor insists that you’re interested), we present a gen-

eral outline of how those formulas are derived.

Think of a set of data containing Xi and Yi, in which i is an index that identifies

each observation in the set, as described in Chapter 2. From those data, SSQ can be

calculated like this:

SSQ

a

bX

Y

i

i

i

(

)2

If you’re good at first-semester calculus, you can find the values of a and b that

minimize SSQ by setting the partial derivatives of SSQ with respect to a and b

equal to 0. If you stink at calculus, trust that this leads to these two simultaneous

equations:

a N

b

Y

Y

(

)

(

)

(

)

a

X

b

X

XY

(

)

(

)

(

)

2

where N is the number of observed data points.

These equations can be solved for a and b:

a

Y

X

X

XY

N

X

X

(

)(

)

(

)((

)

(

)((

)

(

)

2

2

2

b

XY

a

X

X

(

)

( )(

)

(

)

2

See Chapter 2 if you don’t feel comfortable reading the mathematical notations or

expressions in this section.

Running a Straight-Line Regression

Even if it is possible, it is not a good idea to calculate regressions manually or with

a calculator. You’ll go crazy trying to evaluate all those summations and other

calculations, and you’ll almost certainly make a mistake somewhere in your

calculations.